Skip to main content

Public Datasets in the Cloud - Rosalyn Metz and Michael B. Klein


Topics c4l10_staging


When most people think about cloud computing (if they think about it at all), it usually takes one of two forms: Infrastructure Services, such as Amazon EC2 and GoGrid, which provide raw, elastic computing capacity in the form of virtual servers, and Platform Services, such as Google App Engine and Heroku, which provide preconfigured application stacks and specialized deployment tools. Several providers, however, offer access to large public datasets that would be impractical for most organizations to download and work with locally. From a 67-gigabyte dump of DBpedia's structured information store to the 180-gigabyte snapshot of astronomical data from the Sloan Digital Sky Survey, chemistry and biology to economic and geographic data, these datasets are available instantly and backed by enough pay-as-you-go server capacity to make good use of them. We will present an overview of currently-available datasets, what it takes to create and use snapshots of the data, and explore how the library community might push some of its own large stores of data and metadata into the cloud.


Run time 16 minutes 13 seconds
Audio/Visual sound


Reviews

There are no reviews yet. Be the first one to write a review.
Download Options
In Collection
Community Video
Uploaded by
ksclarke
on 5/12/2010
Views
109